Search CORE

22 research outputs found

A new method for interoperability between lexical resources using MDA approach

Author: Haddar Kais
Lhioui Malek
Romary Laurent
Publication venue: HAL CCSD
Publication date: 24/10/2016
Field of study

International audienceLexical resources are increasingly multiplatform due to the diverse needs of linguists. Merging, comparing, finding correspondences and deducing differences between these lexical resources remain difficult tasks. Thus, inte-roperability between these resources is hard even impossible to achieve. In this context, we establish a new method based on MDA approach to resolve interoperability between lexical resources. The proposed method consists of building common structure (OWL-DL ontology) for involved resources. This common structure has the ability to communicate involved resources. Hence, we may create a complex grid between involved resources allowing transformation from one format to another. We experiment our new built method on an LMF lexicon

INRIA a CCSD electronic archive server

Hal-Diderot

A prototype for projecting HPSG syntactic lexica towards LMF

Author: Fehri Héla
Haddar Kais
Romary Laurent
Publication venue
Publication date: 01/01/2012
Field of study

The comparative evaluation of Arabic HPSG grammar lexica requires a deep study of their linguistic coverage. The complexity of this task results mainly from the heterogeneity of the descriptive components within those lexica (underlying linguistic resources and different data categories, for example). It is therefore essential to define more homogeneous representations, which in turn will enable us to compare them and eventually merge them. In this context, we present a method for comparing HPSG lexica based on a rule system. This method is implemented within a prototype for the projection from Arabic HPSG to a normalised pivot language compliant with LMF (ISO 24613 - Lexical Markup Framework) and serialised using a TEI (Text Encoding Initiative) based representation. The design of this system is based on an initial study of the HPSG formalism looking at its adequacy for the representation of Arabic, and from this, we identify the appropriate feature structures corresponding to each Arabic lexical category and their possible LMF counterparts

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Segmentation tool for hadith corpus to generate TEI encoding

Author: Haddar Kais
Maraoui Hajer
Romary Laurent
Publication venue: HAL CCSD
Publication date: 01/09/2018
Field of study

International audienceA segmentation tool for a hadith corpus is necessary to prepare the TEI hadith encoding process. In this context, we aim to develop a tool allowing the segmentation of hadith text from Sahih al-Bukhari corpus. To achieve this objective, we start by identifying different hadith structures. Then, we elaborate an automatic processing tool for hadith segmentation. This tool will be integrated in a prototype allowing the TEI encoding process. The experimentation and the evaluation of this tool is based on Sahih al-Bukhari corpus. The obtained results were encouraging despite some flaws related to exceptional cases of hadith structure

Crossref

INRIA a CCSD electronic archive server

A standard TMF modeling for Arabic patents

Author: Ammar Chihebeddine
Haddar Kais
Romary Laurent
Publication venue: HAL CCSD
Publication date: 19/06/2014
Field of study

International audiencePatent applications are similarly structured worldwide. They consist of a cover page, a speci cation, claims, drawings (if necessary) and an abstract. In addition to their content (text, numbers and citations), all patent publications contain a relatively rich set of well-de ned metadata. In the Arabic world, there is no North African or Arabian Intellectual Property O ce and therefore no uniform collections of Arabic patents. In Tunisia, for example, there is no digital collection of patent documents and therefore no XML collections. In this context, we aim to create a TMF standardized model for scienti c patents and develop a generator of XML patent collections having a uniform and easy to use structure. To test our approach, we will use a collection of XML scienti c patent documents in three languages (Arabic, French, and English)

INRIA a CCSD electronic archive server

Automatic construction of a TMF Terminological Database using a transducer cascade

Author: Ammar Chihebeddine
Haddar Kais
Romary Laurent
Publication venue: HAL CCSD
Publication date: 07/09/2015
Field of study

International audienceThe automatic development of termino-logical databases, especially in a standardized format, has a crucial aspect for multiple applications related to technical and scientific knowledge that requires semantic and terminological descriptions covering multiple domains. In this context, we have two challenges: the first is the automatic extraction of terms in order to build a terminological database, and the second challenge is their normalization into a standardized format. To deal with these challenges, we propose an approach based on a cascade of transducers performed using CasSys tool of Unitex platform that benefits from both: the success of the rule-based approach for the extraction of terms, and the performance of the TMF standard for the representation of terms. We have tested and evaluated our approach on an Arabic scientific and technical documents for the Elevator domain and the results are very encouraging

INRIA a CCSD electronic archive server

Hal-Diderot

Encoding prototype of Al-Hadith Al-Shareef in TEI

Author: Haddar Kais
Maraoui Hajer
Romary Laurent
Publication venue: HAL CCSD
Publication date: 11/10/2017
Field of study

International audienceThe standardization of Al-Hadith Al-Shareef can guarantee the interoperability and interchangeability with other textual sources and takes the processing of Al-Hadith corpus to a higher level. Still, research works on Hadith corpora had not previously considered the standardization as real objective, especially for some standards such as TEI (Text Encoding Initiative). In this context, we aim at the standardization of Al-Hadith Al-Shareef on the basis of the TEI guidelines. To achieve this objective, we elaborated a TEI model that we customized for Hadith structure. Then we developed a prototype allowing the encoding of Hadith text. This prototype analyses Hadith texts and automatically generates a standardized version of the Hadith in TEI format. The evaluation of the TEI model and the prototype is based on Hadith corpus collected from Sahih Bukhari. The obtained results were encouraging despite some flaws related to exceptional cases of Hadith structure

INRIA a CCSD electronic archive server

Towards modeling Arabic lexicons compliant LMF in OWL-DL

Author: Haddar Kais
Lhioui Malek
Romary Laurent
Publication venue: HAL CCSD
Publication date: 19/06/2014
Field of study

International audienceElaborating reusable lexical databases and especially making interoperability operational are crucial tasks effecting both Natural Language Processing (NLP) and Semantic Web. With this respect, we consider that modeling Lexical Markup Framework (LMF) in Web Ontology Language Description Logics (OWL-DL) can be a beneficial attempt to reach these aims. This proposal will have large repute since it concerns the reference standard LMF for modeling lexical structures. In this paper, we study the requirement for this suggestion. We first make a quick presentation of the LMF framework. Next, we define the three ontology definition sublanguages that may be easily used by specific users: OWL Lite, OWL-DL and OWL Full. After comparing of the three, we have chosen to work with OWL-DL. We then define the ontology language OWL and describe the steps needed to model LMF in OWL. Finally, we apply this model to develop an instance for an Arabic lexicon

INRIA a CCSD electronic archive server

Toward the Resolution of Arabic Lexical Ambiguities with Transduction on Text Automaton

Author: Ghezaiel Nadia
Haddar Kais
Publication venue: IEEE
Publication date: 17/04/2015
Field of study

International audienceLexical analysis can be a way to remove ambiguity in Arabic text. Resolution of ambiguity is an important task in several Natural Language Processing (NLP) applications. Our proposed resolution method is based essentially on the use of transducers on text automata. Indeed these transducers specify lexical and contextual rules for Arabic. They allow the resolution of lexical ambiguities. Different types of lexical ambiguities are identified and studied to extract an appropriate set of rules. After that, we describe lexical rules in the ELAG system (Elimination of Lexical Ambiguities by Grammars), which can delete paths representing morphosyntactic ambiguities. In addition, we present an experimentation implemented in the Unitex platform with various linguistic resources to obtain disambiguated syntactic structures suitable for parsing. The obtained results are ambitious and can be improved by adding other rules and heuristics

Crossref

HAL Descartes

Hal-Diderot

An Arabic Probabilistic Parser based on a Property Grammar

Author: Bensalem Raja
Blache Philippe
Haddar Kais
Publication venue: Imed Zitouni
Publication date: 01/01/2023
Field of study

The specificities of the Arabic parsing such as the agglutination, the vocalization and the relatively order-free of words in the Arabic sentences, remain a major issue to consider. To promote its robustness, such parser should define different types of constraints. The Property Grammar formalism (PG) verify the satisfiability of the constraints directly on the units of the structure, thanks to its properties (or relations). In this context, we propose to build a probabilistic parser with syntactic properties, using a PG, and we measure the production rules in terms of different implicit information and in particular the syntactic properties. We experimented our parser on the treebank ATB using the parsing algorithm CYK and obtained encouraging results. Our method is also automatic for the implementation of most property type. Its generalization for other languages or corpus domains (using treebanks) could be a good perspective. Its combination with the pre-trained models of BERT may also make our parser faster

HAL AMU